Back

Cell Genomics

Elsevier BV

Preprints posted in the last 90 days, ranked by how well they match Cell Genomics's content profile, based on 162 papers previously published here. The average preprint has a 0.22% match score for this journal, so anything above that is already an above-average fit.

1
Mismatch tolerance of a gRNA for CRISPR-based gene activation confers broad activity critical for cell reprogramming

Reisman, S. J.; Zhu, W.; Miller, S. E.; Halabi, D.; Sangvai, N.; Crawford, G. E.; Gordan, R.; Gersbach, C. A.

2026-02-03 genomics 10.64898/2026.02.01.703129 medRxiv
Top 0.1%
38.4%
Show abstract

CRISPR activation and interference systems (CRISPRa/i) are widely used for programmable transcriptional control. Although these technologies are capable of highly specific single-gene activity, some applications of transcriptional network reprogramming require broad, genome-wide effects. Here, we identify a CRISPRa gRNA that robustly reprograms astrocyte transcriptional state. Unexpectedly, this activity arises from extensive off-target binding that induces expression changes in thousands of genes, unlike neighboring gRNAs targeting the same intended on-target site. We leverage this promiscuous gRNA to dissect determinants of gRNA-driven off-target dCas9 binding in the context of transcriptional reprogramming. Using ChIP-seq, high-throughput protein-binding microarrays, and gRNA-variant library screening in cells, we demonstrate that PAM-proximal bases are primary determinants of genomic binding, mismatch tolerance is both gRNA- and base-specific, and targeted mutations within the PAM-proximal region can tune gRNA specificity. We further demonstrate that CRISPRa-driven phenotypes can reflect combined contributions from widespread off-target activity and dose-dependent on-target effects. These findings highlight the potentially widespread impacts of CRISPRa off-target activity, underscore the need to account for cryptic effects when selecting and evaluating gRNAs for programming cell phenotypes, and demonstrate that multi-site binding by CRISPRa systems can be exploited as a feature for network-level perturbations in cell reprogramming. GRAPHICAL ABSTRACT O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=83 SRC="FIGDIR/small/703129v1_ufig1.gif" ALT="Figure 1"> View larger version (34K): org.highwire.dtl.DTLVardef@b697b0org.highwire.dtl.DTLVardef@1a0b390org.highwire.dtl.DTLVardef@16ce710org.highwire.dtl.DTLVardef@b5d87a_HPS_FORMAT_FIGEXP M_FIG C_FIG

2
Genomic Disaggregation Reveals Distinct Admixture Patterns and Cardiometabolic Risk Loci in Black Hawaiians

Vand, K.; Badia, N.; Khotchouk, B.

2026-01-26 genomics 10.64898/2026.01.24.701518 medRxiv
Top 0.1%
32.9%
Show abstract

BackgroundThe systematic aggregation of distinct admixed subpopulations into broad racial categories creates genomic blind spots that undermine the promise of precision medicine. Black Hawaiians (BH) exemplify this exclusion. Characterized by a unique tri-continental ancestry (African, European, and Native Hawaiian/Pacific Islander) and disproportionate cardiometabolic burden, their population-specific risk drivers remain masked by systematic conflation with broader ancestral cohorts. MethodsWe performed the first comprehensive genomic analysis of 287 BH participants from the NIH All of Us Research Program using whole-genome sequencing (WGS). Following haplotype phasing (SHAPEIT5), we characterized population structure (ADMIXTURE, PCA), inferred local ancestry tracts (RFMix), and reconstructed demographic history (SMC++). Genome-wide allele frequency differentiation (AFD) was calculated against tri-continental reference panels, and Electronic Health Record (EHR) data were integrated to quantify the populations cardiometabolic burden. ResultsThe cohort exhibited complex tri-continental admixture (mean: 67.0% African, 22.1% European, 10.9% NHPI) with high inter-individual heterogeneity. Phenotypic analysis confirmed a substantial disease burden (34.8% hypertension, mean BMI 31.2 kg/m2), while SMC++ reconstruction revealed a sharp demographic bottleneck in recent generations. Genome-wide AFD analysis of 8.9M variants demonstrated systematic differentiation (mean {Delta} vs African: 0.041, NHPI: 0.069, European: 0.084). The top 100 differentiated variants mapped to 31 unique genes, identifying distinct candidates including MYO9A, RAB37, and PEAR1. Notably, differentiation in the cytoskeletal regulator MYO9A suggests a mechanostructural etiology for kidney disease distinct from classical APOL1 cytotoxicity, while PEAR1 variants implicate population-specific pharmacogenomic resistance to antiplatelet therapy. ConclusionThis study highlights the critical necessity of data disaggregation in genomic research, using the Black Hawaiian population as a paradigmatic example. By distinguishing this community from broader aggregate groups, we uncovered a distinct genomic architecture with unique admixture patterns that drive specific cardiometabolic risks. These findings demonstrate the necessity of granular resolution for achieving equitable precision medicine.

3
Phenome-derived polygenic scores and social determinants jointly shape context-dependent disease risk

Wang, Y.; Truong, B.; Lu, W.; Fadil, C.; He, Y.; Luo, W.; Koyama, S.; Tsuo, K.; Paruchuri, K.; Yu, Z.; Hull, L. E.; Zheng, Z.; Carey, C. E.; Walters, R. K.; Neale, B. M.; Robinson, E. B.; Kraft, P.; Natarajan, P.; Martin, A. R.

2026-04-18 genetic and genomic medicine 10.64898/2026.04.16.26351039 medRxiv
Top 0.1%
26.7%
Show abstract

Polygenic scores (PGS) are typically derived from single-trait genome-wide association studies (GWAS), yet many complex diseases arise from shared genetic liability distributed across correlated clinical dimensions. Accordingly, disease risk depends not only on how genetic liability is represented but also on the social context in which that liability is expressed. Whether phenome-derived latent factors improve prediction, and how social determinants of health (SDoH) modify the realized utility of PGS, remains unclear. Here we constructed PGS for 35 orthogonal latent phenomic factors derived from 2,772 phenotypes in 361,114 UK Biobank (UKB) participants and evaluated their phenomic specificity, cross-dataset portability and predictive performance relative to conventional disease-specific PGS across the UKB holdout, Mass General Brigham Biobank and the All of Us (AoU) Research Program. Factor-based PGS showed widespread, biologically coherent phenome-wide associations that were reproducible across biobanks and ancestries. Their predictive utility, however, was strongly disease dependent. For asthma, a respiratory factor PGS outperformed an internally derived disease-specific PGS and showed superior cross-ancestry portability, retaining 41.5% of European-ancestry predictive accuracy in African-ancestry individuals, compared with 22.9% for an asthma PGS derived from the largest available multi-ancestry GWAS. By contrast, disease-specific PGS remained superior for coronary artery disease (CAD) and type 2 diabetes (T2D). These findings suggest that phenome-derived aggregation is most beneficial when disease-specific GWAS incompletely capture underlying liability, including settings of biological heterogeneity or imprecise phenotyping. We then evaluated SDoH in AoU as a complementary axis shaping prevalent disease prediction beyond genetic susceptibility. Across all three diseases, SDoH contributed substantial and largely independent predictive information beyond the disease-optimal genetic model. SDoH also modified how genetic liability translated into observed disease prevalence: for asthma and CAD, genetic stratification attenuated with increasing social burden, whereas this attenuation was substantially weaker for T2D. As a result, the same genetic percentile corresponded to different standardized predicted prevalences across social strata, reflecting disease-specific shifts in baseline prevalence, genetic gradients and calibration. Together, these findings indicate that disease risk is shaped by both genetic liability and the social context in which that liability is realized. Phenome-derived PGS improve prediction under specific architectural conditions, whereas social context independently modifies the performance, calibration and interpretation of genetic risk across populations.

4
Substantial genomic and methylation variability between MCF-7 sublines

Atanda, H. C.; Ewing, A. D.

2026-02-19 genomics 10.64898/2026.02.17.706500 medRxiv
Top 0.1%
22.9%
Show abstract

Cancer cell lines have long been used as in vitro models for molecular assays in diagnostic and therapeutic development due to their accessibility as a well-controlled system. MCF-7 cell lines are the most widely studied cell lines in human breast cancer research, and its sublines have been reported to exhibit clonal, cytogenetic, and transcriptomic variability. However, allele-specific methylation alterations in cancer genomes remain inadequately explored, largely due to limitations in sequencing methods. Here, we applied nanopore sequencing technology to characterise the genomic and epigenomic landscapes of two MCF-7 sublines. We identified global and local DNA methylation differences as well as structural variants (SVs), and single-nucleotide variants (SNVs) between and within the sublines. Our analysis revealed substantial divergence in methylation patterns between the sublines, with [~]3% of the differentially methylated regions (DMRs) overlapping with known cancer driver genes. These DMRs overlap breast cancer-associated genes, including ERBB2, CDH1, SALL4, GATA2, GATA3, HMGA2, and FBLN2. We find that the majority of differentially methylated sites are explained by differential allelic methylation, and that allele-specific DMRs often overlap points where antisense non-coding RNAs overlap protein-coding genes. Transposable elements in both sublines also showed distinct methylation profiles, with one subline having hypomethylated L1 elements compared to the other, which correlated with the amount of apparent insertional mutagenesis attributable to L1 between the sublines. Our study demonstrates the utility of nanopore sequencing in providing novel insights into genomic and methylomic differences within cell lines, in addition to insight into the nature of differential allelic methylation.

5
Bridging the Genomic Equity Gap with Context-Enhanced Risk Stratification in American Indians: the Strong Heart Study

Du, J.; Horimoto, A. R. V. R.; Best, L. G.; Zhang, Y.; Cole, S.; Umans, J. G.; Franceschini, N.; Sun, Q.

2026-02-11 genetic and genomic medicine 10.64898/2026.02.08.26345859 medRxiv
Top 0.1%
22.5%
Show abstract

Polygenic scores (PGS) show promise for disease risk stratification but suffer from limited portability across populations. American Indians face a disproportionate burden of cardiovascular disease yet remain significantly underrepresented in genomic research, limiting equitable access to precision medicine. Here, we evaluate whether integrating specific lifestyle and clinical context variables with PGS enhances risk prediction for cardiometabolic traits in 424,622 European from UK Biobank (UKB) and 3,157 American Indian populations from the Strong Heart Study (SHS). By comparing genetics-only models to full models incorporating context variables and gene-context interactions across blood pressure traits, coronary heart disease (CHD), and stroke, we found that the integration of context variables significantly improved prediction accuracy in both cohorts. Notably, for American Indian participants, the new model incorporating context and genetic interactions significantly improved model discrimination for CHD compared to an established clinical risk model. These findings suggest that modeling the interplay between inherited risk and modifiable factors can recover predictive power loss due to imperfect PGS transferability, offering a viable pathway toward more equitable and effective precision medicine for under-represented populations.

6
Single-cell CRISPR activation screens in primary B cells discover gene regulatory mechanisms for hundreds of autoimmune risk loci.

Kriachkov, V.; Ching, J. W. H.; Lancaster, J.; Vespasiani, D.; Denny, N.; Hamley, J. C.; Gubbels, L.; Bandala Sanchez, E.; Neeland, M.; Levi, E.; Davies, K.; Shanthikumar, S.; Shevchenko, G.; Bryant, V. L.; Hodson, D. J.; Davies, J. O. J.; King, H. W.

2026-03-02 genomics 10.64898/2026.03.01.708923 medRxiv
Top 0.1%
22.2%
Show abstract

Genome-wide association studies (GWAS) have discovered thousands of genetic variants linked to autoimmune disease, and yet the molecular pathways underlying autoimmunity have remained elusive. A key challenge is that >90% of identified GWAS risk loci are in non-coding genomic regions making it difficult to predict their relevance to disease. Here, we have curated fine-mapped non-coding risk variants from over 30 different autoimmune traits including common conditions such as systemic lupus erythematosus (SLE), Crohns disease, and multiple sclerosis, and reveal shared genetic signatures between diverse autoimmune diseases. We subsequently performed a high-throughput single-cell multi-omic CRISPR activation screen targeting 763 autoimmune risk loci in primary human B cells (a highly relevant cell type to autoimmune diseases) and discover 524 cis-regulatory target gene effects for 378 risk loci, with many risk loci regulating multiple gene targets. This Single Cell Analysis of Non-coding Distal Autoimmune Loci (SCANDAL) provides a powerful experimental resource linking non-coding risk loci to many disease-relevant genes, including lowly-expressed cytokines and transcription factors for which perturbation effects can be difficult to quantify with other CRISPR-based strategies. We reveal how increased transcriptional activity at one non-coding risk locus can drive transcription at other risk loci within the same regulatory landscape that may be relevant to understand genetic pleiotropy of autoimmune diseases. Finally, we quantified allele-specific effects on target gene expression with massive parallel reporter assays and prime editing to discover a gain-of-function variant associated with SLE that controls expression of the transcription factor REL/cREL which subsequently binds dozens of risk loci and target genes associated with different autoimmune diseases. Our study provides a valuable resource linking non-coding risk loci with their cis-regulatory target genes and advances our understanding of the shared genetic networks and mechanisms involved in autoimmunity.

7
Improving isoform-level eQTL and integrative genetic analyses of breast cancer risk with long-read RNA transcript assemblies

Head, S. T.; Nemani, A.; Chang, Y.-H.; Harrison, T. A.; Bresnahan, S. T.; Rothstein, J. H.; Sieh, W.; Lindstroem, S.; Bhattacharya, A.

2026-03-23 genomics 10.64898/2026.03.22.713514 medRxiv
Top 0.1%
21.9%
Show abstract

Most eQTL and TWAS analyses quantify expression using aggregate, tissue-agnostic transcript annotations and ignore isoform-level regulation, potentially obscuring or misattributing regulatory mechanisms. Here, we developed a framework leveraging publicly available long-read RNA-seq data to perform tissue-informed inference of genetic regulation and prioritize candidate causal isoforms for breast cancer risk. We quantified gene- and isoform-level expression in breast tumor (TCGA), non-cancerous mammary tissue, and cultured fibroblasts (GTEx) using three transcriptome annotations: standard GENCODE, tissue-specific long-read-derived assemblies, and combined annotations incorporating transcript-isoforms from both. While GENCODE cataloged over 250,000 pan-tissue isoforms, the tissue-specific long-read assemblies captured reduced sets of 74,717 isoforms in tumor, 48,057 in fibroblasts, and 22,941 in healthy breast. We performed eQTL mapping and fine-mapping, followed by colocalization with overall and subtype-specific breast cancer GWAS and isoform-level TWAS. While most eGenes were concordant across annotations, approximately 1/3 of lead cis-eQTLs for shared eGenes differed between long-read assemblies and GENCODE. Further, eIsoform discovery was highly annotation-specific. In healthy breast tissue, the gold standard tissue for building gene expression prediction models for TWAS of breast cancer, 46% of eIsoforms identified by the long-read annotation were unique to that annotation even though 93.7% of them are present in GENCODE. Despite combined annotations expanding the GENCODE catalog by only 0.6-7.6% depending on tissue source, 69% of unique significant isoform-trait associations were specific to a single annotation. Long-read-informed annotations uncovered regulatory associations entirely missed by GENCODE, including a candidate regulatory isoform at the MARK1 locus captured only in fibroblasts and a previously unannotated splice variant prioritized as the likely effector transcript at NUP107. These findings demonstrate that transcript annotation is not merely a technical consideration but critically defines the biological hypothesis space for regulatory mechanisms and shapes discovery. Incorporating tissue-resolved isoform annotations from long-read RNA-seq improves the specificity of regulatory inference and enhances identification of candidate causal isoforms at GWAS loci.

8
De Novo Variation in Autism by Sex and Diagnostic Status in 41,367 Parent-Child Trios

Turner, T. N.

2026-01-29 genetic and genomic medicine 10.64898/2026.01.26.26344889 medRxiv
Top 0.1%
18.6%
Show abstract

Autism shows a consistent sex bias, yet how sex shapes de novo variant (DNV) risk across coding and noncoding sequence remains unclear. We analyzed DNVs in 41,367 parent-child sequenced trios from three autism family-based cohorts and compared DNV characteristics and enrichment patterns in males and females. Importantly, these trios consisted of some trios with individuals with autism and some without autism. We developed a new sex-aware DNV caller and performed intensive, feature-based investigation of each candidate DNV to produce a high-confidence callset. We identified enrichment of missense and loss-of-function (LOF) DNVs both overall and within known autism-related genes (i.e., SFARI genes). Gene-specific enrichment analyses revealed twelve genes that were exome-wide significant and specific to males, for significance, including FOXP1, SMAD6, AUTS2, CCDC168, PIEZO1, EML6, ZNF84, IGSF23, OTOG, SLC6A1, GIGYF1, and FREM3 and four genes that were specific to females, for significance, including TAOK1, MECP2, DDX3X, and TBL1XR1 within a variant class. Direct comparisons of DNVs in males and females revealed GABBR2 as the only gene trending toward enrichment in the direct males with autism comparison to females with autism. Finally, we analyzed promoters and identified a single significant promoter region (p = 3.8x10-13), associated with the WDR74 gene, with the signal driven by DNVs observed in males with autism. Surprisingly, the noncoding RNA gene RNU2-2 lies within this significant WDR74 promoter and accounted for most of the DNVs in the region. RNU2-2 DNVs were present in 0.2% of males with autism, and several are predicted to potentially alter RNA folding. We also observed RNU2-2 DNVs in 0.2% of females with autism, including two DNVs that were recurrent (i.e., shared) with unrelated, affected males. Notably, RNU2-2 DNVs were detected in 0.1% of unaffected males and were not observed in unaffected females. Together, these results suggest that although RNU2-2 does not show a sex bias, it contributes to autism risk, which is intriguing due to a prior study implicating RNU2-2 in a severe neurodevelopmental disorder.

9
Ancestral and environmental diversity shape the immune landscape in Indonesia

Fachrul, M.; Sukonthamarn, P.; Kusuma, P.; Novita, M. M.; Alvim, I.; Apriyana, I.; Crenna-Darusallam, C.; Christian, A.; Groudko, A.; Kendle, R.; Limardi, P. C.; Mee, E. D.; Oktavianthi, S.; Peter, L. M.; Priliani, L.; Utami, B. L.; Sokoy, F.; Frank, S. A. K.; Wihandani, D. M.; Dewi, N. N. A.; Darwinata, A. E.; Cox, M. P.; Banovich, N. E.; Sudoyo, H.; Malik, S. G.; Gallego Romero, I.

2026-02-15 genomics 10.64898/2026.02.15.704933 medRxiv
Top 0.1%
18.4%
Show abstract

Island Southeast Asia (ISEA) remains consistently underrepresented in human genomic resources despite its exceptional ancestral and lifestyle diversity. The interplay between the regions complex population history and its environmental variation provide a window into how ancestry and environment jointly shape the human immune system. Here we report the generation of single-cell PBMC profiles from 199 Indonesians sampled across four communities in the islands of Bali and New Guinea. These groups capture diversity in regional genetic ancestries (East Asian-like and Papuan-like) and lifestyle contrasts (urban versus rural communities in Bali; highland versus lowland communities in New Guinea). We identify over 4,000 expression quantitative trait loci (eQTLs) across nine immune cell types, including eQTLs driven by introgression from both Neanderthals and Denisovans at genes such as IL7R, HLA-E, or STAT2. We also find evidence of local ancestry driving gene-by-environment interactions at pathogen receptors such as MARCO, although the majority of gene-by-environment interactions are not driven by differences in genetic structure between populations. Beyond direct genetic effects, we construct gene co-expression networks that consistently identify environmental signatures, as well as T-cell receptor repertoires that distinguish specific communities, with excess representation of interferon-stimulated genes in rural, but not urban samples. This work establishes a framework for population-aware functional genomics in understudied regions and highlights how ancestral and environmental diversity jointly shape human immunity in this globally important yet underrepresented region.

10
Multiplex Portuguese Families as a Lens into rare mutations and the Shared Genetic Architecture of Schizophrenia, Mood Disorders, and Autism Spectrum Disorders

Pato, C. N.; Pato, M. T.; Mulle, J.; Hart, R. P.; Pang, Z.; Knowles, J. A.; Singh, T.; Maddhesiya, P.; Carvalho, C.; Merikangas, A.; Medeiros, H.; Bigdeli, T. B.; Kazemi, H.; Drake, J.; Vladimrov, V.; Maher, B.; Bacanu, S.-A.; Neale, B.; Fanous, A.

2026-04-07 genetic and genomic medicine 10.64898/2026.04.06.26350177 medRxiv
Top 0.1%
18.4%
Show abstract

In an analysis of 173 multiplex families from the Portuguese Island Collection (PIC) we characterize the shared genetic architecture of serious mental illnesses (SMI) including schizophrenia (SZ), bipolar disorder (BP), major depression (MDD), and autism (ASD). Within this cohort, co-segregation of psychotic and mood disorders occurred in 28% of families, while 7% demonstrated co-segregation of intellectual disability or ASD with SZ and mood disorder phenotypes. Whole-genome sequencing (WGS) was performed on a three-generation PIC family to identify rare, large-effect variants. We identified an extremely rare predicted loss of function (LoF) mutation in the Chromodomain Helicase DNA Binding Protein 2 (CHD2) gene. These results demonstrate that high-density multiplex families in founder populations are a powerful resource for mapping rare, large-effect variants that cross clinical diagnostic boundaries, as the identified CHD2 mutation suggests that the disruption of a single neurodevelopmental gene may lead to diverse SMI phenotypes. By combining population and family-based methodologies, this approach leverages shared genetic backgrounds and environments to provide a unique opportunity for cellular studies to explore the biological mechanisms underlying SMI, offering significant potential to inform future functional research and identify novel therapeutic targets.

11
A Single-Cell and Spatial 3D Multi-omic Atlas of Developing Human Basal Ganglia and Inhibitory Neurons

Heffel, M. G.; Xu, H.; Pastor-Alonso, O.; Li, X.; Baig, M. S.; Irfan Ghoor, R.; Li, R.; Kern, C.; Kum, J.; Zhang, Y.; Paino, J.; Tsai, M. J.; Tai, C.-Y.; Tucker, G.; Zhao, Z.; Hou, A.; von Behren, Z.; Bhade, M.; Li, S.; Sandoval, K.; Scholes, J.; Codrea, F.; Calimlim, J.; Liao, E. K.; Leung, G.; Kim, J.; Eskin, E.; Flint, J.; Cotter, J. A.; Pasaniuc, B.; Bintu, B.; Zhu, Q.; Mukamel, E. A.; Ernst, J.; Paredes, M. F.; Luo, C.

2026-01-29 genomics 10.64898/2026.01.28.702385 medRxiv
Top 0.1%
18.4%
Show abstract

The human basal ganglia (BG), subcortical nuclei fundamental to motor regulation and cognitive modulation, is constructed from neurons produced during gestation in the adjacent ganglionic eminences (GEs). GEs are transient structures in the ventral prenatal brain that also generate GABAergic inhibitory neurons which migrate to destinations in the BG, cortex and other destinations. This study aims to elucidate the epigenomic and 3D-genomic dynamics involved in the specification and maturation of GEs and GE-derived neurons, using single-nucleus methyl-3C sequencing (snm3C-seq), highly-multiplexed spatial transcriptomics, and chromatin+RNA single-molecule imaging. Our multi-modal data support a heterogeneous temporal progression across GE subregions, with the lateral GE (LGE) showing declining neurogenic activity in mid-gestation and caudal GE (CGE) exhibiting ongoing developmental progression through infancy. We identified regulatory programs that specify subtypes of BG principal cells, medium spiny neurons (MSN), via synchronized maturation of the 3D-epigenome. In infant brains, we found a transient short-range enriched (SE) chromatin conformation during the transition between oligodendrocyte progenitors (OPCs) and oligodendrocytes (ODCs), and a temporary shift toward Long-range Enriched (LE) chromatin conformation in projection neurons, extending previous works showing the differentiation of neurons and glial cells is associated with permanent SE and LE conformation, respectively. Lastly, we found that gene regulatory regions active in MSNs were enriched in loci associated with genetic risk for neuropsychiatric disease. Our study delineates the highly complex, lineage-specific 3D genomic dynamics in ventral progenitors and basal ganglia populations of the perinatal human brain. HighlightsO_LIJoint 3D genome and DNA methylome analysis of ventral brain progenitor zones C_LIO_LIHeterogeneous developmental progressions of the ganglionic eminences C_LIO_LIDistinct development dynamics and regulatory landscape of MSNs and interneurons C_LIO_LITransient remodeling of the 3D-genome in neurons and oligodendrocyte progenitors C_LI

12
Implementation of the genome-informed risk assessment (GIRA) may lead to large disruptions to the health system

Lapinska, S.; Li, X.; Mandla, R.; Shi, Z.; Tozzo, V.; Flynn-Carroll, A.; Ritchie, M. D.; Rader, D. J.; Penn Medicine Biobank, ; Pasaniuc, B.

2026-02-27 genetic and genomic medicine 10.64898/2026.02.25.26347123 medRxiv
Top 0.1%
18.2%
Show abstract

The Genome Informed Risk Assessment (GIRA) report from eMERGE has become a standard approach to implement genomic precision medicine at scale. Here, we assess GIRAs utility and impact in a health care system independent of eMERGE, focusing on 9 adult conditions using the Penn Medicine Biobank (PMBB, n=48,279). We find a large number of patients - 50.1% (n=24,185) - were deemed by GIRA as high-risk for at least one of the 9 conditions with 30.4% (n=14,676) due to polygenic and/or monogenic risk. Stratifying by ancestry revealed significant differences in high-risk proportions, with higher rates in African/African American (AFR) (56.6% vs. 50.1%, p=7.43x10-36) and lower rates in East (42.0%) and South Asian (40.0%). Increased high-risk rates were observed in the lowest quartile of social deprivation index, highlighting the influence of environmental factors and access to care on GIRAs utility. GIRA was a good predictor of prevalent cases (in-line with the eMERGE GIRA reported results); incident case prediction was substantially attenuated for 5 of the 9 conditions (e.g., OR of 2.36 vs. HR of 1.31 for atrial fibrillation (AFIB)). We find demographic compositions of high-risk patients differed from the incident cases for some of the conditions; for example, high-risk for AFIB individuals where enriched for European ancestries in contrast with incident AFIB cases that were enriched for AFR ancestries. Overall, our results show the accuracy of GIRA as a biomarker to stratify high-risk patients for precision medicine and highlight implementation challenges in its impact on the health system if implemented at scale.

13
Integrating multi-omic QTLs and predictive models reveals regulatory architectures at immune-related GWAS loci in CD4+ T cells

Matos, M. R.; Ghatan, S.; Bankier, S.; Thompson, T. V.; Lundy-Perez, K.; Suzuki, M.; Dona-Termine, R.; Stauber, J.; Reynolds, D.; Rosales, K.; Griffen, A.; Isshiki, M.; Simpson, D.; Ahmed, O.; Gold, S.; Ostrowiak, S. R.; Raj, S.; Milman, S.; Lappalainen, T.; Greally, J. M.

2026-01-30 genetic and genomic medicine 10.64898/2026.01.27.26344979 medRxiv
Top 0.1%
18.2%
Show abstract

Functional interpretation is essential for understanding how genetic variants contribute to complex traits. Here, we identified and characterized regulatory variants in CD4+ T cells collected from 362 donors. We integrated molecular QTL mapping from single-cell RNA-seq profiles and chromatin accessibility with predicted variant effects from a deep learning model trained on chromatin accessibility data. We identified molecular features and transcription factor binding mechanisms underlying variant sharing and mediated effects across the modalities and approaches. While predicted variant effects correlated with molQTLs, only a small fraction of empirically detected molQTLs were discovered by predictive models. MolQTLs, primarily those affecting chromatin, indicated potential molecular drivers for 33% of immune-related GWAS loci, with the deep learning approach providing insights into 4.7% of GWAS loci. These results highlight the value of multi-omic data and systematic integration of empirical and predictive approaches to interpret regulatory effects of genetic variants.

14
Single-cell multiomic profiling of lung immune cells identifies novel asthma risk genes and cell-type specific functions

GU, J.; Decker, D. C.; Zhong, X.; Sperling, A. I.; Ober, C.; Nobrega, M. A.; HE, X.; Schoettler, N.

2026-02-09 genetic and genomic medicine 10.64898/2026.02.05.26345013 medRxiv
Top 0.1%
18.1%
Show abstract

AbstractGenome-wide studies (GWAS) on asthma have identified nearly 200 genomic loci. However, the underlying mechanisms remain mostly elusive. While functional profiling of blood immune cell types has helped interpret asthma GWAS signals, high-resolution functional genomic data of lung immune cells, which differ from circulating immune cells, are lacking. We thus profiled single-cell multi-omics (RNA-seq and ATAC-seq) on lymphocytes of lung and spleen tissues from 9 donors. Cross-tissue comparison identified distinct transcriptomes for each immune cell type, but subtle differences in chromatin accessibility. We next assessed open chromatin regions (OCRs) of lung vs. blood, using a public dataset, for their enrichment of asthma risk. Strikingly, lung T cells showed unique contributions to heritability of adult-onset (AOA) and childhood-onset asthma (COA), beyond blood T cells. Using lung OCRs and previously fine-mapped variants for AOA and COA, we identified 43 cis-regulatory elements (CREs) likely contributing to asthma risk. By creating enhancer-gene maps from our single-cell data, we identified target genes for these CREs. We highlighted CCR4 and LRRC32 with their CREs displaying cell-type specific regulatory activities. Lastly, we built cell-type level gene regulatory networks (GRNs) to identify target genes of transcription factors (TFs). Lung GRNs not only shed light on the cell-type specific functions of several TFs that are known asthma risk genes, but also allowed us to detect novel TFs such as STAT1 that may regulate asthma-related biological pathways in CD4 T cells. Our results demonstrate the utility of single-cell multiomics to identify asthma risk genes and understand their cell-type specific functions.

15
Multi-ancestral GWAS with the VA Million Veteran Program enables functional interpretation of rheumatoid arthritis alleles

Sakaue, S.; Yang, D.; Zhang, H.; Posner, D.; Rodriguez, Z.; Love, Z.; Cui, J.; Budu-Aggrey, A.; Ho, Y.-L.; Costa, L.; Monach, P.; Huang, S.; Ishigaki, K.; Melley, C.; Tanukonda, V.; Sangar, R.; Maripuri, M.; Sweet, S. M.; Panickan, V.; McDermott, G.; Hanberg, J. S.; Riley, T.; Laufer, V.; Okada, Y.; Scott, I.; Bridges, S. L.; Baker, J.; VA Million Veteran Program, ; Wilson, P. W.; Gaziano, J. M.; Hong, C.; Verma, A.; Cho, K.; Huffman, J. E.; Cai, T.; Raychaudhuri, S.; Liao, K. P.

2026-04-23 genetic and genomic medicine 10.64898/2026.04.22.26351423 medRxiv
Top 0.1%
18.1%
Show abstract

Rheumatoid arthritis (RA) is a heritable and common autoimmune condition. To date, most genetic associations were derived from individuals with either European or East Asian ancestries. Here, we applied a multimodal automated phenotyping strategy to define RA and performed a genome-wide association study (GWAS) of RA in the Million Veteran Program (MVP), including underrepresented African American (AFR) and Admixed American (AMR) populations. Meta-analyses with previous RA cohorts identified 152 autosomal genome-wide significant loci, of which 31 were novel. Inclusion of multi-ancestry data dramatically improved fine-mapping resolution. Functional characterization of these loci using single-cell transcriptomic and chromatin data suggested new RA genes such as CHD7 and CD247. We identified underappreciated functional roles of fine-grained immune cell states other than T cells, such as B cell and myeloid cell states. We observed that multi-ancestry polygenic risk scores using our data demonstrated better predictive ability, especially for AFR and AMR populations.

16
Transposable element-host genome evolutionary arms race revealed by multi-modal epigenomic profiling in a telomere-to-telomere human genome reference

Nikitin, D.

2026-03-23 genomics 10.64898/2026.03.19.712972 medRxiv
Top 0.1%
18.0%
Show abstract

For a quarter of a century transposable elements have been recognized as a major component of the human genome, comprising 46.1% according to recent estimates, and as key drivers of regulatory innovation as well as participants in an ongoing evolutionary arms race with host defense systems. Using the newly released T2T ENCODE dataset, we quantified the epigenetic impact of 3.7 million transposable elements across evolutionary time by analyzing seven epigenomic modalities in twelve human cell lines, spanning six transposon classes, 44 families, and 1,122 subfamilies. We show that SVA elements exhibit the strongest signatures of the arms race, characterized by progressive escape from H3K9me3-mediated heterochromatinization accompanied by increased acquisition of CTCF binding and enhancer-associated chromatin marks. Among Alu elements, the AluYb8 and AluYb9 subfamilies display age-dependent accumulation of CTCF binding, while seven LTR subfamilies (HERV16-int, MER11C, LTR43-int, HERVE-int, LTR22C, LTR5_Hs, HERVIP10FH-int) demonstrate dynamic evolutionary behavior within active chromatin, H3K9me3 chromatin and CTCF contexts. We further evaluated the relative contribution of distinct epigenomic modalities to the host-transposable element conflict and found that transposon-driven evolution is dominated by evasion of host-imposed heterochromatinization primarily at H3K9me3 and secondarily at H3K27me3, together with progressive invasion into CTCF-rich regions. In contrast, enhancer, promoter, and H3K36me3 marks appear to play more limited roles. Collectively, these findings deepen our insight into the coevolutionary epigenomic dynamics between human genome and transposable elements and the associated processes driving regulatory innovation.

17
Telomere length of both parents contribute to heritable POT1 cancer-predisposition syndrome

Martin, A.; Lu, R.; Blake, A.; Nichols, K. E.; Sanchez, S. E.; Artandi, S. E.; Sharma, R.; Hockemeyer, D.

2026-02-11 genetics 10.64898/2026.02.09.704652 medRxiv
Top 0.1%
17.1%
Show abstract

Germline mutations in POT1 are linked to familial cancer predisposition, and somatic POT1 mutations occur recurrently in tumors. These mutations promote oncogenesis by enabling aberrant telomere elongation. For inherited POT1 mutations, a critical question is the extent to which elongated telomeres are transmitted to the next generation from the POT1 carrier parent and whether the inherited hyper-elongated telomeres elevate cancer risk. Using a nanopore sequencing approach that provides haplotype-specific telomere length measurements, we examined telomere inheritance in families harboring POT1 mutations. We found that individuals preferentially inherit their longest telomeres from the carrier parent, consistent with extensive telomere elongation in the carrier germline, whereas their comparatively short telomeres originate from the non-carrier parent. Analysis of carrier and non-carrier siblings showed that both sets of parental telomeres are longer in POT1 carriers, yet the shortest non-carrier-derived telomeres undergo disproportionately greater elongation than those inherited from the carrier parent. This identifies a mechanism of genetic anticipation in which the inheritance of long telomeres from one parent drives excessive extension of shorter telomeres. These findings demonstrate that telomere length inherited from both parents jointly defines the telomere-based tumor suppressor mechanism. Summary sentenceAllele specific nanopore sequencing reveals that POT1 mutations reshape germline and somatic telomere dynamics, uncovering a novel mechanism of generational anticipation driven by preferential elongation of short inherited telomeres.

18
Structural variants contribute substantially to complex trait heritability

Nguyen, D. T.; Shadrin, A. A.; Parker, N.; Fuhrer, J.; Vo, N. S.; Dale, A.; Andreassen, O.; Frei, O.

2026-03-03 genetics 10.64898/2026.02.28.708732 medRxiv
Top 0.1%
14.4%
Show abstract

Despite accumulating evidence that structural variants (SVs) exert disproportionate functional effects, their genome-wide contribution to complex trait heritability has not yet been systematically quantified. Here, we introduce MiXeRSV, a tool that integrates long-read-derived SVs from existing reference catalogs with genome-wide association summary statistics to quantify SV heritability and enrichment. Applying MiXeR-SV to 105 complex traits, we identify 31 traits with significant enrichment (Bonferroni-corrected P < 4.8 x 10-4), with SVs explaining up to 32% of total heritability, despite comprising only 0.6% of the analyzed variants. Enrichment is most extensive in hematological, metabolic/biomarker, and cancer traits, trait-specific in neuropsychiatric and cardiometabolic phenotypes, and observed across all anthropometric traits, albeit with modest effect sizes, consistent with their highly genetic architecture. These findings are robust across two independent SV reference panels built from complementary long-read and graph-based variant catalogs, with 93.5% of significant traits consistent between technologies. Furthermore, incorporating SV architecture into heritability models consistently increases total heritability estimates in enriched phenotypes, with SV heritability proportions correlating with the gap between SNP-based and twin heritability estimates across traits. Cross-ancestry replication in Biobank Japan confirms enrichment patterns. Our quantification of SVs contributions to genetic architectures has significant implications for genetic prediction and fine-mapping of human complex traits and common diseases.

19
Multi-omics Differential Inference for Functional Interpretation (MoDIFI): A Statistical Framework to Prioritize Cell Lines for Neurodevelopmental Variants

VR, A.; Shaw, G. T.-W.; Manuel, J.; Mosbruger, T. L.; Heins, H.; Ng, J. K.; Kim, H.; Hayeck, T. J.; Turner, T. N.

2026-01-29 genomics 10.64898/2026.01.29.702065 medRxiv
Top 0.1%
14.2%
Show abstract

Noncoding variants contribute to neurodevelopmental disorders (NDDs), but their regulatory effects are often cell-type specific, making it difficult to choose an in vitro model for high-throughput assays such as massively parallel reporter assays. We asked: given a set of noncoding variants, which cell line and regulatory regions are most likely to reveal measurable allele-specific effects? We generated matched multiomics profiles across commonly used NDD in vitro models: human neuronal lines (i.e., IMR-32, SH-SY5Y, SK-N-SH), mouse neuronal lines (i.e., HT-22, Neuro-2a), and a non-neuronal line (i.e., HEK-293), using RNA-seq, ATAC-seq, and Hi-C under consistent conditions. To integrate these orthogonal data types, we developed MoDIFI (Multi-omics Differential Inference for Functional Interpretation), a Bayesian framework that quantifies cell-line-specific regulatory activity by computing posterior inclusion probabilities (PIPs) for differential gene-loop interactions. MoDIFI identifies regulatory regions supported by coordinated 3D contacts, accessibility, and transcriptional output, producing cell-line-resolved regulatory maps that highlight both shared synaptic programs and context-dependent mechanisms. These results provide a practical strategy for prioritizing the most informative cell lines and candidate regulatory elements for targeted functional testing of NDD-relevant noncoding variation.

20
Reprogramming of neuronal genome function and phenotype by astrocytes

Li, B.; Hagy, K.; Safi, A.; Beer, M. A.; Barrera, A.; Geraghty, S.; Rai, R.; Pederson, A. N.; Reisman, S. J.; Love, M. I.; Sullivan, P. F.; Eroglu, C.; Crawford, G. E.; Gersbach, C. A.

2026-03-07 genomics 10.64898/2026.03.07.710282 medRxiv
Top 0.1%
13.9%
Show abstract

Heterotypic cell-cell interactions are critical to governing cellular physiology, disease progression, and responses to the environment and pharmacologic interventions. For example, neurons and astrocytes engage in intricate interactions that are essential for brain development and function1-3. However, the transformation of these extracellular signals into epigenomic regulation that governs cell function is poorly understood. Here, we report that weeks of co-culture between human induced pluripotent stem cell (hiPSC)-derived neurons and mouse cortical astrocytes extensively reprograms gene expression and the chromatin accessibility landscape in neurons, affecting thousands of genes and putative gene regulatory elements (REs), including many transcription factors (TFs). These genes are enriched for functions implicated in neuronal differentiation and maturation, and tend to be impacted in schizophrenia, and autosomal dominant Alzheimers disease. Through complementary CRISPR interference and activation screens, we recapitulated hundreds of astrocyte-induced transcriptional and chromatin remodeling events in mono-cultured neurons at both promoters and distal regulatory elements (REs) of TF genes. We discovered functional REs for [~]50 astrocyte-responsive TF genes, providing a map of gene regulatory network control. Astrocyte-responsive TF genes fall into groups that exert independent or counter-balancing transcriptional effects, highlighting the complex coordination of the neuronal response to astrocytes. Functional effects of specific TFs, including POU3F2 and TFAP2E, on neurite morphology and neuronal electrophysiology are consistent with transcriptional effects, demonstrating the capacity of direct epigenetic control to mimic heterotypic cellular signals. This work illuminates the regulation of neurodevelopment-and disease-relevant gene modules by neuron-astrocyte interactions, and provides a blueprint for applying modern functional genomics to uncover the links between cell microenvironment and epigenomic programming. HighlightsO_LINeuronal gene expression and chromatin accessibility landscape are profoundly remodeled by astrocytes over weeks of co-culture C_LIO_LIAstrocyte-responsive neuronal gene modules and neuron-responsive astrocytic gene modules are enriched for genes associated with schizophrenia and familial Alzheimers Disease C_LIO_LISingle-cell CRISPR interference and activation screens of astrocyte-responsive gene regulatory elements identified dozens of functional regulatory elements of TF genes in neurons C_LIO_LISingle-cell CRISPR interference and activation screens of >200 astrocyte-responsive TF genes uncovered discrete functional clusters that promote neuronal maturity or stemness C_LIO_LIAstrocyte-responsive TF genes reprogram neuronal electrophysiology and neurite morphology C_LI